Incentivizing Exploration with Heterogeneous Value of Money

نویسندگان

  • Li Han
  • David Kempe
  • Ruixin Qiang
چکیده

Recently, Frazier et al. proposed a natural model for crowdsourced exploration of different a priori unknown options: a principal is interested in the long-term welfare of a population of agents who arrive one by one in a multi-armed bandit setting. However, each agent is myopic, so in order to incentivize him to explore options with better long-term prospects, the principal must offer the agent money. Frazier et al. showed that a simple class of policies called time-expanded are optimal in the worst case, and characterized their budget-reward tradeoff. The previous work assumed that all agents are equally and uniformly susceptible to financial incentives. In reality, agents may have different utility for money. We therefore extend the model of Frazier et al. to allow agents that have heterogeneous and non-linear utilities for money. The principal is informed of the agent’s tradeoff via a signal that could be more or less informative. Our main result is to show that a convex program can be used to derive a signal-dependent time-expanded policy which achieves the best possible Lagrangian reward in the worst case. The worst-case guarantee is matched by so-called “Diamonds in the Rough” instances; the proof that the guarantees match is based on showing that two different convex programs have the same optimal solution for these specific instances. These results also extend to the budgeted case as in Frazier et al. We also show that the optimal policy is monotone with respect to information, i.e., the approximation ratio of the optimal policy improves as the signals become more informative.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Habit Formation and Naiveté in Gym Attendance: Evidence from a Field Experiment∗

We extend the gym-attendance study of Charness and Gneezy (2009) by incentivizing subjects to attend the gym for a month, observing their preand post-treatment attendance relative to a control group, and eliciting subjects’ preand post-treatment predictions of their post-treatment attendance. We find a habit formation effect similar to that of Charness and Gneezy in the shortrun, but with subst...

متن کامل

A Novel Combinatorial Approach to Discrete Fracture Network Modeling in Heterogeneous Media

Fractured reservoirs contain about 85 and 90 percent of oil and gas resources respectively in Iran. A comprehensive study and investigation of fractures as the main factor affecting fluid flow or perhaps barrier seems necessary for reservoir development studies. High degrees of heterogeneity and sparseness of data have incapacitated conventional deterministic methods in fracture network modelin...

متن کامل

Separating Well Log Data to Train Support Vector Machines for Lithology Prediction in a Heterogeneous Carbonate Reservoir

The prediction of lithology is necessary in all areas of petroleum engineering. This means that to design a project in any branch of petroleum engineering, the lithology must be well known. Support vector machines (SVM’s) use an analytical approach to classification based on statistical learning theory, the principles of structural risk minimization, and empirical risk minimization. In this res...

متن کامل

Multi-stage product development with exploration, value-enhancing, preemptive and innovation options

We provide a real options framework for the analysis of product development that incorporates research and exploration actions, product attribute value-enhancing actions with uncertain outcome, as well as preemption and innovation options. We derive two-stage analytic formulas and propose a general multi-period solution using a numerical lattice approach. Our analysis reveals that exploration a...

متن کامل

Human Interaction with Recommendation Systems: On Bias and Exploration

Recommendation systems rely on historical user data to provide suggestions. We propose an explicit and simple model for the interaction between users and recommendations provided by a platform, and relate this model to the multi-armed bandit literature. First, we show that this interaction leads to a bias in naive estimators due to selection effects. This bias leads to suboptimal outcomes, whic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015